Long Sentence Partitioning using Structure Analysis for Machine Translation

نویسندگان

  • Yoon-Hyung Roh
  • Young Ae Seo
  • Ki-Young Lee
  • Sung-Kwon Choi
چکیده

in machine translation, long sentences are usually assumed to be difficult to treat. The main reason is the syntactic ambiguity which increases explosively as a sentence become longer. Especially, in the machine translation using sentence patterns, a long sentence causes a critical coverage problem. In this paper, we present a method of sentence partitioning which recognizes sub-sentence ranges by structure analysis, reducing the length of a sentence for translation. For the analysis of the clausal structure, phrase-level sentence patterns which have only a little syntactic ambiguities are employed. The structure analysis is conducted by the recognition of starting points of all clauses, dependency analysis, and depth analysis. Then, the ranges of sub-sentences are extracted based on the depth by stages. Our method was evaluated on 108 sentences extracted from CNN transcripts. It showed 85.2% accuracy in the detection of simple sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

For the Proper Treatment of Long Sentences in a Sentence Pattern- based English-Korean MT System

This paper describes a sentence pattern-based English-Korean machine translation system backed up by a rule-based module as a solution to the translation of long sentences. A rule-based EnglishKorean MT system typically suffers from low translation accuracy for long sentences due to poor parsing performance. In the proposed method we only use chunking information on the phraselevel of the parse...

متن کامل

Sentence Segmentation Using IBM Word Alignment Model 1

In statistical machine translation, word alignment models are trained on bilingual corpora. Long sentences pose severe problems: 1. the high computational requirements; 2. the poor quality of the resulting word alignment. We present a sentence-segmentation method that solves these problems by splitting long sentence pairs. Our approach uses the lexicon information to locate the optimal split po...

متن کامل

Sub-Sentence Division for Tree-Based Machine Translation

Tree-based statistical machine translation models have made significant progress in recent years, especially when replacing 1-best trees with packed forests. However, as the parsing accuracy usually goes down dramatically with the increase of sentence length, translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence div...

متن کامل

Transformation-based Sentence Splitting method for Statistical Machine Translation

We propose a transformation based sentence splitting method for statistical machine translation. Transformations are expanded to improve machine translation quality after automatically obtained from manually split corpus. Through a series of experiments we show that the transformation based sentence splitting is effective pre-processing to long sentence translation.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001